14 research outputs found
Imbalanced Ensemble Classifier for learning from imbalanced business school data set
Private business schools in India face a common problem of selecting quality
students for their MBA programs to achieve the desired placement percentage.
Generally, such data sets are biased towards one class, i.e., imbalanced in
nature. And learning from the imbalanced dataset is a difficult proposition.
This paper proposes an imbalanced ensemble classifier which can handle the
imbalanced nature of the dataset and achieves higher accuracy in case of the
feature selection (selection of important characteristics of students) cum
classification problem (prediction of placements based on the students'
characteristics) for Indian business school dataset. The optimal value of an
important model parameter is found. Numerical evidence is also provided using
Indian business school dataset to assess the outstanding performance of the
proposed classifier
A Nonparametric Ensemble Binary Classifier and its Statistical Properties
In this work, we propose an ensemble of classification trees (CT) and
artificial neural networks (ANN). Several statistical properties including
universal consistency and upper bound of an important parameter of the proposed
classifier are shown. Numerical evidence is also provided using various real
life data sets to assess the performance of the model. Our proposed
nonparametric ensemble classifier doesn't suffer from the `curse of
dimensionality' and can be used in a wide variety of feature selection cum
classification problems. Performance of the proposed model is quite better when
compared to many other state-of-the-art models used for similar situations
Real-time forecasts and risk assessment of novel coronavirus (COVID-19) cases: A data-driven analysis
The coronavirus disease 2019 (COVID-19) has become a public health emergency
of international concern affecting 201 countries and territories around the
globe. As of April 4, 2020, it has caused a pandemic outbreak with more than
11,16,643 confirmed infections and more than 59,170 reported deaths worldwide.
The main focus of this paper is two-fold: (a) generating short term (real-time)
forecasts of the future COVID-19 cases for multiple countries; (b) risk
assessment (in terms of case fatality rate) of the novel COVID-19 for some
profoundly affected countries by finding various important demographic
characteristics of the countries along with some disease characteristics. To
solve the first problem, we presented a hybrid approach based on autoregressive
integrated moving average model and Wavelet-based forecasting model that can
generate short-term (ten days ahead) forecasts of the number of daily confirmed
cases for Canada, France, India, South Korea, and the UK. The predictions of
the future outbreak for different countries will be useful for the effective
allocation of health care resources and will act as an early-warning system for
government policymakers. In the second problem, we applied an optimal
regression tree algorithm to find essential causal variables that significantly
affect the case fatality rates for different countries. This data-driven
analysis will necessarily provide deep insights into the study of early risk
assessments for 50 immensely affected countries
Bayesian Neural Tree Models for Nonparametric Regression
Frequentist and Bayesian methods differ in many aspects, but share some basic
optimal properties. In real-life classification and regression problems,
situations exist in which a model based on one of the methods is preferable
based on some subjective criterion. Nonparametric classification and regression
techniques, such as decision trees and neural networks, have frequentist
(classification and regression trees (CART) and artificial neural networks) as
well as Bayesian (Bayesian CART and Bayesian neural networks) approaches to
learning from data. In this work, we present two hybrid models combining the
Bayesian and frequentist versions of CART and neural networks, which we call
the Bayesian neural tree (BNT) models. Both models exploit the architecture of
decision trees and have lesser number of parameters to tune than advanced
neural networks. Such models can simultaneously perform feature selection and
prediction, are highly flexible, and generalize well in settings with a limited
number of training observations. We study the consistency of the proposed
models, and derive the optimal value of an important model parameter. We also
provide illustrative examples using a wide variety of real-life regression data
sets
A novel distribution-free hybrid regression model for manufacturing process efficiency improvement
This work is motivated by a particular problem of a modern paper
manufacturing industry, in which maximum efficiency of the fiber-filler
recovery process is desired. A lot of unwanted materials along with valuable
fibers and fillers come out as a by-product of the paper manufacturing process
and mostly goes as waste. The job of an efficient Krofta supracell is to
separate the unwanted materials from the valuable ones so that fibers and
fillers can be collected from the waste materials and reused in the
manufacturing process. The efficiency of Krofta depends on several crucial
process parameters and monitoring them is a difficult proposition. To solve
this problem, we propose a novel hybridization of regression trees (RT) and
artificial neural networks (ANN), hybrid RT-ANN model, to solve the problem of
low recovery percentage of the supracell. This model is used to achieve the
goal of improving supracell efficiency, viz., gain in percentage recovery. In
addition, theoretical results for the universal consistency of the proposed
model are given with the optimal value of a vital model parameter. Experimental
findings show that the proposed hybrid RT-ANN model achieves higher accuracy in
predicting Krofta recovery percentage than other conventional regression models
for solving the Krofta efficiency problem. This work will help the paper
manufacturing company to become environmentally friendly with minimal
ecological damage and improved waste recovery
An Interpretable Probabilistic Autoregressive Neural Network Model for Time Series Forecasting
Forecasting time series data presents an emerging field of data science that
has its application ranging from stock price and exchange rate prediction to
the early prediction of epidemics. Numerous statistical and machine learning
methods have been proposed in the last five decades with the demand for
generating high-quality and reliable forecasts. However, in real-life
prediction problems, situations exist in which a model based on one of the
above paradigms is preferable, and therefore, hybrid solutions are needed to
bridge the gap between classical forecasting methods and scalable neural
network models. We introduce an interpretable probabilistic autoregressive
neural network model for an explainable, scalable, and "white box-like"
framework that can handle a wide variety of irregular time series data (e.g.,
nonlinearity and nonstationarity). Sufficient conditions for asymptotic
stationarity and geometric ergodicity are obtained by considering the
asymptotic behavior of the associated Markov chain. During computational
experiments, PARNN outperforms standard statistical, machine learning, and deep
learning models on a diverse collection of real-world datasets coming from
economics, finance, and epidemiology, to mention a few. Furthermore, the
proposed PARNN model improves forecast accuracy significantly for 10 out of 12
datasets compared to state-of-the-art models for short to long-term forecasts
Prediction of Transportation Index for Urban Patterns in Small and Medium-sized Indian Cities using Hybrid RidgeGAN Model
The rapid urbanization trend in most developing countries including India is
creating a plethora of civic concerns such as loss of green space, degradation
of environmental health, clean water availability, air pollution, traffic
congestion leading to delays in vehicular transportation, etc. Transportation
and network modeling through transportation indices have been widely used to
understand transportation problems in the recent past. This necessitates
predicting transportation indices to facilitate sustainable urban planning and
traffic management. Recent advancements in deep learning research, in
particular, Generative Adversarial Networks (GANs), and their modifications in
spatial data analysis such as CityGAN, Conditional GAN, and MetroGAN have
enabled urban planners to simulate hyper-realistic urban patterns. These
synthetic urban universes mimic global urban patterns and evaluating their
landscape structures through spatial pattern analysis can aid in comprehending
landscape dynamics, thereby enhancing sustainable urban planning. This research
addresses several challenges in predicting the urban transportation index for
small and medium-sized Indian cities. A hybrid framework based on Kernel Ridge
Regression (KRR) and CityGAN is introduced to predict transportation index
using spatial indicators of human settlement patterns. This paper establishes a
relationship between the transportation index and human settlement indicators
and models it using KRR for the selected 503 Indian cities. The proposed hybrid
pipeline, we call it RidgeGAN model, can evaluate the sustainability of urban
sprawl associated with infrastructure development and transportation systems in
sprawling cities. Experimental results show that the two-step pipeline approach
outperforms existing benchmarks based on spatial and statistical measures
Epicasting: An Ensemble Wavelet Neural Network (EWNet) for Forecasting Epidemics
Infectious diseases remain among the top contributors to human illness and
death worldwide, among which many diseases produce epidemic waves of infection.
The unavailability of specific drugs and ready-to-use vaccines to prevent most
of these epidemics makes the situation worse. These force public health
officials and policymakers to rely on early warning systems generated by
reliable and accurate forecasts of epidemics. Accurate forecasts of epidemics
can assist stakeholders in tailoring countermeasures, such as vaccination
campaigns, staff scheduling, and resource allocation, to the situation at hand,
which could translate to reductions in the impact of a disease. Unfortunately,
most of these past epidemics exhibit nonlinear and non-stationary
characteristics due to their spreading fluctuations based on seasonal-dependent
variability and the nature of these epidemics. We analyse a wide variety of
epidemic time series datasets using a maximal overlap discrete wavelet
transform (MODWT) based autoregressive neural network and call it EWNet model.
MODWT techniques effectively characterize non-stationary behavior and seasonal
dependencies in the epidemic time series and improve the nonlinear forecasting
scheme of the autoregressive neural network in the proposed ensemble wavelet
network framework. From a nonlinear time series viewpoint, we explore the
asymptotic stationarity of the proposed EWNet model to show the asymptotic
behavior of the associated Markov Chain. We also theoretically investigate the
effect of learning stability and the choice of hidden neurons in the proposal.
From a practical perspective, we compare our proposed EWNet framework with
several statistical, machine learning, and deep learning models. Experimental
results show that the proposed EWNet is highly competitive compared to the
state-of-the-art epidemic forecasting methods
Semiparametric Survival Analysis of 30-Day Hospital Readmissions with Bayesian Additive Regression Kernel Model
In this paper, we introduce a kernel-based nonlinear Bayesian model for a right-censored survival outcome data set. Our kernel-based approach provides a flexible nonparametric modeling framework to explore nonlinear relationships between predictors with right-censored survival outcome data. Our proposed kernel-based model is shown to provide excellent predictive performance via several simulation studies and real-life examples. Unplanned hospital readmissions greatly impair patients’ quality of life and have imposed a significant economic burden on American society. In this paper, we focus our application on predicting 30-day readmissions of patients. Our survival Bayesian additive regression kernel model (survival BARK or sBARK) improves the timeliness of readmission preventive intervention through a data-driven approach
Semiparametric Survival Analysis of 30-Day Hospital Readmissions with Bayesian Additive Regression Kernel Model
In this paper, we introduce a kernel-based nonlinear Bayesian model for a right-censored survival outcome data set. Our kernel-based approach provides a flexible nonparametric modeling framework to explore nonlinear relationships between predictors with right-censored survival outcome data. Our proposed kernel-based model is shown to provide excellent predictive performance via several simulation studies and real-life examples. Unplanned hospital readmissions greatly impair patients’ quality of life and have imposed a significant economic burden on American society. In this paper, we focus our application on predicting 30-day readmissions of patients. Our survival Bayesian additive regression kernel model (survival BARK or sBARK) improves the timeliness of readmission preventive intervention through a data-driven approach